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ABSTRACT 



A method and system within a processor are disclosed for 
executing sdected instrudions among a number of instruc- 
tions stored within a memory, wherein the processor has a 
maximum of instructions that can dispatched for execution 
during each processor cyde. A subset of the instructions are 
fetched from the memory for execution. A determination is 
then made whether the set of instructions indudes an 
unresolved branch instruction. In response to a determina- 
tion that the set of instructions includes an unresolved 
branch instruction, a prediction is made whether a branch 
indicated by the branch instruction will be taken or will not 
be taken. In response to a prediction that the branch will be 
talcen, a nonsequential target instruction indicated by the 
branch instruction is fetdied from memory. A determination 
is made whether the maximum number of instructions can 
be dispatched for execution during a processor cycle sub- 
sequent to the branch prediction without dispatching instruc- 
tions within the sequential execution path. In response to a 
determination that less than the maximum number of target 
instructions can be dispatched in the processor cycle sub- 
sequent to the brandi prediction without dispatching instruc- 
tions within the sequential execution path, an instruction 
within the sequential executioQ path is speculatively dis- 
patched for execution. In response to refutation of the branch 
prediction, the fetch of the nonsequential target instruction is 
cancelled and the instruction within the sequential execution 
path is executed, thereby minimizing a performance penalty 
incurred by the processor due to the mispredicted branch. 

18 Claims, 4 Drawing Sheets 
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METHOD AND SYSTEM FOR MINIMIZING 
BRANCH MKPR EDICTION PENALTIES 

wrrmN a processor 

BACKGROUND OF THE INVENTION 

1. Technical Field 

The present invention relates in general to a method and 
system for data processing and in particular to a method and 
system for executing instructions within a processor. Still 
more particularly, the present invention relates to a method 
and system for executing instiuctions within a processor 
such that the branch misprediction penalty incurred when a 
branch is incouectly predicted as taken is minimized. 

2. Description of the Related Art 

A conventional high-performance processor includes an 
instruction cache for storing instructions, an instruction 
buffer for temporarily storing instractions fetched from the 
instruc^on cache for execution, a number of execution units 
for executing sequential instractions^ a branch processing 
unit for executing branch instructions, a dispatch unit for 
dispatching sequential instruction from the instruction buffer 
to particular ones of the executions units, and a con^letion 
buffer for temporarily storing instructions that have finished 
execution, but have not been completed. 

As is well-known in the art, sequential instructions 
fetched from the instruction queue are stored within the 
instmction buffer pending dispatch to the execution units. In 
contrast, branch instructions fetched from the instruction 
cache are typically forwarded directly to the branch pro- 
cessing unit for execution. In some cases, the condition 
register value upon whidi a conditional branch depends can 
be ascertained prior to executing the branch instruction, that 
is, the branch can be resolved prior to execution. If a branch 
is resolved as taken prior to execution, instractions at the 
target address of the branch instraction are fetched and 
executed by the processor. In addition, any sequential 
instractions following the branch that have been prefetched 
are discarded. However, the outcome of a brandi instraction 
often cannot be determined prior to executing the branch 
instruction due to a condition register dependency. When a 
branch instruction remains unresolved at execution, the 
branch processing unit utilizes a prediction mechanism, such 
as a branch history table, to predict which execution path 
should be taken. In conventional processors, the dispatch of 
sequential instractions following a branch predicted as taken 
is halted and instractions widiin the speculative target 
instruction stream are fetched during the next processor 
cycle. If the branch that was predicted as taken is resolved 
as mispredicted, a mispredict penalty is incurred by the 
processor due to the cyde time required to restore the 
sequential execution stream following the branch instrac- 
tion. 

Referring now to FIGS. 4a-4i?, there is depicted an 
example illustrating the mispredict penalty incurred when a 
branch instraction is incorrectiy predicted as taken. In FIG. 
42, an instraction sequence is illustrated which includes a 
conditional branch instruction (BC) that branches to a target 
instruction (TO) based upon a condition register value 
generated by a con^)are instraction (CMP). The instraction 
sequence depicted in FIG. 4a also includes 4 sequential 
instractions S0-S3. A timing diagram depicting the execu- 
tion of the instraction sequence within a conventional pro- 
cessor having a fetch bandwidth of 4 instractions and a 
dispatch bandwidth of 2 instractions is illustrated in FIG. 4b. 

In cyde 1 of FIG. 4b, instractions SO, CMP, SI, and BC 
are fetched from the instraction cache and stored within the 
instraction buffer. During cycle 2, the 4 subsequent sequen- 
tial instractions (S2, S3, S4, and S5) are fetched while 
instractions SO and CMP are dispatched to tiie execution 



units for execution. In addition, the conditional branch BC 
is predicted as taken in cycle 2. Consequentiy, target instrac- 
tions TO and Tl are fetched in cyde 3. During cycle 3, the 
branch instraction also resolves inconectiy since 04P fin- 

^ ishes execution during the cyde. Because BC was predicted 
as taken in cyde 2, only sequential instruction preceding BC 
are dispatched in cyde 3. Since the correct current fetch 
address is not restored until cyde 4, the correct sequential 
instractions cannot be executed by the execution units until 
cyde 6. Thus, as illustrated in FIG. 4b, the processor incurs 

10 a mispredict penalty between the execution of sequential 
instractions SI and the execution of sequential instractions 
S2 and S3. The mispredict penalty, which is defined as die 
number of cycles that the execution units are idle or execut- 
ing instractions within the mispredicted path, delays the 
execution of S2 by two cydes and the execution of S3 by 
one cyde, resulting in an average mispredia penalty of 1.5 
cydes. A half cyde penalty is incurred during cyde 4 since 
only one instraction is executed out of the two instractions 
that could be executed during that cyde. 
Because of the performance penalty associated with the 

20 misprediction of an unresolved branch as taken, it would be 
desirable to provide an improved method and system for 
executing instractions that minimize the branch mispredic- 
tion penalty incurred in cases in which a branch is incor- 
rectly predicted as taken. 

2^ SUMMARY OF THE INVENTEON 

It is therefore one object of the present invention to 
provide an iitp'oved method and system for data process- 
ing. 

3Q It is another object of the present invention to provide an 
improved method and system for executing instractions 
within a processor. 

It is yet another object of the present invention to provide 
an improved method and system for executing instractions 
sudi that such that the brandi misprediction penalty incurred 
when a branch is incoirecUy predicted as taken is mini- 
mized. 

The foregoing objects are achieved as is now described. 
A method and system within a processor are disclosed for 
executing sdected instructions among a number of instrac- 

^ tions stored within a memory, wherein die processor has a 
maximum of instractions that can dispatched for execution 
during each processor cycle. A subset of the instractions are 
fetched from the memory for execution. A determination is 
then made whether the set of instractions includes an 

45 unresolved branch instraction. In response to a determina- 
tion that the set of instractions includes an unresolved 
branch instraction, a prediction is made whether a branch 
indicated by the branch instraction will be taken or will not 
be taken. In response to a prediction tiiat the branch will be 

so taken, a nonsequential target instraction indicated by the 
branch instraction is fetched from memory. A determination 
is made whether the maximum number of instractions can 
be dispatched for execution during a processor cycle sub- 
sequent to the branch prediction without dispatching instrac- 
tions within the sequential execution path. In response to a 
determination that less than the maximum number of target 
instractions can be dispatched in the processor cycle sub- 
sequent to the branch prediction without dispatching instrac- 
tions within the sequential execution path, an instraction 
within the sequential execution path is speculatively dis- 

^ patched for execution. In response to refutation of the branch 
prediction, the fetch of the nonsequential target instraction is 
cancelled and the instraction within the sequential execution 
path is executed, thereby minimizing a performance penalty 
incurred by the processor due to the mispredicted branch. 

65 The above as well as additional objects, features, and 
advantages of the present invention will become apparent in 
die following detailed written description. 
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BRIEF DESOUPnON OF THE DRAWINGS tions witiiin a particular dass of sequential instructions 

. . , . . during each processor cycie. For example, FXU 22 peifonns 

The novel features bdieved characteristic of tiieinvcnuon fUed-point mathematical operations such as addition, 

are set forth in the expended daims. The invention itself subtraction, ANDing, ORing, and XORing, utilizing source 

however, as well as a preferred mode of use, further objects ^ operands received from spediied general purpose registers 

and advantages thereof, will best be understood by reference (GPRs) 32. Following the execution of a fixed-point 

to the foUowing detaUed description of an iUustrative instruction, FXU 22 ou^uts the data results of the instruc- 

embodiment when read in conjunction with the accompa- ^qj^ Qp^ rename buffers 33, which provide temporary 

nying drawings, wherein: storage for the result data until the instruction is conq)leted 

FIG. 1 illustrates a block diagram of a preferred embodi- by transferring the result data from GPR rename buffers 33 

ment of a processor which employs the method and system to one or more of GPRs 32. Convcrsdy, FPU 30 perforins 

of the present invention; floating-point operations, such as floating-point multiplica- 

FIG. 2 is a flowchart depicting a method according to the tion and division, on source operands received from 

present invention for executing instructions sudi that the floating-point registers (FPRs) 36. FPU 30 ou^uts data 

branch misprediction penalty incurred when a branch is resulting from the execution of floating-point instructions to 

incoirectiy predicted as taken is minimized; selected FPR rename buffers 37, which temporarily store the 

HGS. 3a-3b illustrate an example of the execution of a result data until the instructions are completed by transfer- 
sequence of instructions induding a conditional branch ring tiie result data from FPR rename buffers 37 to sdected 
instruction, wherein the branch misprediction penalty its name implies, LSU 28 executes floating- 
incurred when the branch is mispredicted as taken is mini- P^^^ fixed-point instructions which either load data 
mized according to the metiiod and system of the present ^ memory (i.e., either data cadie 16 or main memory) 
invention- and selected GPRs 32 or FPRs 36 or which store data from 

rrroo A Ai, ^ ' ^ • -* « 1 * *u a selected GPRs 32 or FPRs 36 to memory. 

HGS. 4a-4^ depict a pnor art exanmle of the execution ^ . ^ ^ 

of a sequence of instructions induding a conditional branch Processor 10 employs both pipelinmg and out-of-order 

instruction, wherein the processor executing the sequence of execution of instructions to further improve the performance 

instructions incurs a branch misprediction penalty. 25 of its superscalararc^te^ jn 

^ 1- be executed by FXU 22, LSU 28, and FPU 30 in any order 

DETAILED DESCRIPTION OF PREFERRED ^^ng as data dependencies are observed. In addition. 

EMBODIMENT instructions are processed by each of FXU 22, LSU 28, and 

FPU 30 at a sequence of pipeline stages. As is typical of 

With reference now to the figures and in particular with high-performance processors, each instruction is processed 

reference to FIG. 1, there is illustrated a block diagram of a at five distinct pipeline stages, namdy, fetch, decode/ 

processor, indicated generally at 10, for processing infor- dispatch, execute, finish, and completion, 

mation according to a preferred embodiment of the present During the fetch stage, sequential fetcher 17 retrieves one 

invention. In the depicted embodiment, processor 10 com- or more instructions associated with one or more memory 

prises a single integrated circuit supo-scalar microprocessor. addresses from instruction cadie 14. Sequential instructions 

Accordingly, as discussed further below, processor 10 fetched from instruction cache 14 are stored by sequential 

includes various execution units, registers, buffers, fetcher 17 within instruction queue 19. In contrast, sequen- 

memories, and other functional units, which are all formed tial fetdier 17 removes branch instnjctions from the instnic- 

by integrated drcuitry. In a prefaied embodiment of the tion stream and forwards them to BPU 18 for execution, 

present invention, processor 10 conq)rises one of die Pow- According to the present invention, BPU 18 includes a 

erPC™ line of microprocessors, which operates according 40 ^a^j^h prediction mechanism, which in a preferred embodi- 

to reduced instruction set computing (RISC) techniques. As naent comprises a dynamic prediction mechanism such as a 

depicted in FIG. 1, processor 10 is coupled to system bus 11 branch history table, that enables BPU 18 to speculatively 

via a bus interface unit (BIU) 12 >yithin processor 10. BIU execute unresolved conditional branch instructions by pre- 

12 controls the transfer of information between processor 10 dieting whether the branch will be taken. Alternatively, in 

and other devices coupled to system bus 11, such as a main 45 other embodiments of the present invention a static, 

memory (not illustrated). Rrocessor 10, system bus 11, and compiler-based prediction mechanism can be implemented, 

the other devices coupled to system bus 11 together form a As will be described in greater detail below, the present 

host data processing system. invention minimizes the branch misprediction penalty 

BIU 12 is connected to instruction cache 14 and data incurred by processor 10 in cases in which a branch is 

cache 16 within processor 10. High speed caches, such as 50 incorrecdy predicted as taken. 

instruction cadie 14 and data cache 16, enable processor 10 During the decode/dispatch stage, dispatch unit 20 

to achieve relativdy fast access time to a subset of data or decodes and dispatches one or more instructions from 

instructions previously transfeired from main memory to instruction queue 19 to the appropriate ones of execution 

caches 14 and 16. thus improving Uie speed of operation of units 22, 28, and 30. Also during the decode/dispatdi stage, 

the host data processing system- Instruction cache 14 is dispatch unit 20 allocates a rename buffer within GPR 

further coupled to sequential fetcher 17, which fetches rename buffers 33 or FPR rename buffers 37 for each 

instructions from instruction cache 14 during each cycle for dispatdied instruction's result data. According to a preferred 

execution. Sequential fetcher 17 transmits branch instruc- embodiment of the present invention, processor 10 dis- 

tions fetched from instruction cache 14 to branch processing patches instructions in program order and tracks the program 

unit (BPU) 18 for execution, but stores sequential instruc- order of the dispatched instructions during out-of-order 

tions witiiin instruction queue 19 for execution by other 60 execution utilizing unique instruction identifiers:inIddition> 

execution circuitry within processor 10. to mMtfuctioh'identifiCT.'eacfei nstruction within tiie; 

In the depicted embodiment, in addition to BPU 18, the (^pn^irip^^^;;pf-processor4^0-i 

execution circuitry of processor 10 comprises multiple ylticfa^indic^ esr^^^e^ ^ 

execution units, including fixed-point unit (FXU) 22, load/ sgg culativerb itZthat^indicat^-whether the itfstruction.isZp 

store unit (LSU) 28, and floating-point unit (FPU) 30. As is 65 wSynia^^^culatiy 

well-known to those skilled in the computer arts, each of ^tion^pa^h-is-resolved^as-in^^brr^ 

execution units 22, 28, and 30 executes one or more instruc- ^speculative patfa-are-flushed-from-processor j O^yjcleging. 
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((i^^ng):thcJvalid-bit-associated.withiiistractions^^h^^ dieted branch will resolve correctly, prior art processors 

(get3)eculativejitr ^ - flush fetched sequential instructions following the predicted 

Duhng the execute stage, execution units 22, 28, and 30, branch, as iilustrated in FIGS. 4a~4b, even if the execution 

execute instructions received &om dispatch unit 20 as soon units will be idle. In contrast, according to the present 

as the source operands for the indicated operations are $ invention, sequential instructions following the branch that 

available. After execution has terminated, exeaition units have already been fetched are speculatively dispatched to 

22. 28. and 30 store data results within either GPR rename the execution units if the execution units would otherwise be 

buffers 33 or FPR rename buffers 37, dq>ending upon the idle. Thus, die i^esent invention permits instructions which 

instruction type. Then, exeoition units 22. 28, and 30 signal have a 10--20 percent probability of successful execution to 

completion unit 40 that the execution unit has finished an be dispatched pending resolution of a branch, thereby elimi- 

instcuction. Finally, instructions are completed in program nating idle execution unit cycles. As wUl be ^predated by 

order by transferring result data from GPR rename buffers those skilled in the art. as instruction queue 19 becomes 

33 or FPR rename buffers 37 to GPRs 32 or FPRs 36, larger, the percentage of branches predicted near the top of 

respectively. instruction queue 19 increases. Predicting brandies dose to 

Referring now to FIG. 2, there is depicted a flowchart of the top of instruction queue 19 provides suffidcnt cycle time 

a method of executing instructions according to the present for processor 10 to remove sequential instructions following 

invention which minimizes the branch misprediction penalty tiie brand) from instruction queue 19 and to replace the 

incurred when a branch is mispredicted as taken. The sequential instructions with target instructions. Thus, in 

method illustrated in FIG. 2 will be described with reference processors utilizing larger instruction queues , fewer scqucn- 

to FIGS. 3a-36, which depict an exen^)lary sequence of tial instructions are speculatively dispatched according to 

instructions and timing diagram of the execution of the 20 the method of the present invention since execution units 

instructions. The sequence of instructions illustrated in FIG. will not be idle while waiting for target instructions to be 

3fl is identical to the prior art sequence of instructions fetched from instruction cache 14, 

depicted in FIG. 4a and accordingly illustrates the benefit of Referring to cyde 3 of FIG. 3fc, because target instruc- 

tiie present invention by comparison thereto. tions have not already been fetched, sequential instruction 

Referring first to FIG. 2. the process begins at block 50 25 S2 is speculatively dispatched for execution by one of 

and thereafter proceeds to block 52, which depicts sequen- execution units 22, 28, and 30. In addition, target instruc- 

tial fetcher 17 fetching the next group of instructions from tions TO and Tl are fetched from instruction cache 14. 

instruction cache 14. The fetch performed at block 52 is Furthermore, branch instruction BC is resolved as raispre- 

illustrated in FIG. 3* at cycle 1, where instructions SO, CMP, dieted during cycle 3 since the compare instruction CMP 

SI, and branch instruction BC are fetched. Next, the process 30 finishes execution, thereby providing the condition register 

proceeds to block 54, which illustrates determining whether value utilized to resolve BC. 

the set of instruction fetdied at block 52 includes an unre- R^uming to FIG. 2, the process proceeds from block 66 

solved branch instruction. If not a determination is made at to block 68, which illustrates deternuning whether the 

block 56 whether the set of instructions fetched at block 52 b ranch instruction resolv ed co^:ectiy^SJh^ ;fen ch:;instr^ Cr^ 

includes a resolved branch instruction. If the set of instmc- j^on^re^lyestcxjrrccUy^ 

tions indudes neither an unresolved nor a resolved branch (^gch'^^pjcgrjn^ggls pjgutotiyerse^ 

instruction, tiie process proceeds to block 76, which depicts ffroih>cxg:titio h:umts.22 r.28 

continuing the fetching, dispatching, and execution of ^sodatediwi thLseq uenti allingffui^ 

sequential instructions. However, if a determination is made (tivc^^bit^fan^J^^ 

at block 56 that the set of instructions indudes a resolved (^structions,^The process then proceeds to block 72, which 

taken branch instruction, the process passes to block 58, ^ illustrates dispatch unit 20 dispatching the target instruction 

which iUustrates processor 10 fetching and executing stream on the following cyde. The process then terminates 

instructions within the target instruction stream after exeait- at block 78. 

ing sequential instructions preceding the resolved branch f^Re^mig ^tabl5c 08,.if~thfclbranch-itfstructioa-resoIves 

instruction. Thereafter, the process terminates at block 78. ^as mispredictcdrfee'proc^s~ p^ 

Returning to block 54, if ttie set of instructions fetched 45?depidsfcanc,elHhg3he"_f etch~ 

from instruction cache 14 indudes an unresolved branch dexeciitrng jl^ seque ntial" instruictip ns^wKiclMveretspea^^ 

instruction, as does the set fetdied during cyde 1 in FIG. 36, C^|g]^^atdy^/Since the sequential instructions are no 

sequential fetcher 17 transfers the unresolved branch longer within a speculative execution path, the speculative 

instruction to BPU 18. Next, the process passes to block 60. bits assodated with the sequential instructions are also 

which depicts BPU 18 predicting whether the branch will be 50 cleared at block 74. The process then proceeds to block 76, 

taken or not Referring again to FIG. 36, die branch instruc- which depicts continuing the sequential execution stream, 

tion is predicted as taken during cycle 2. At cycle 2, the 4 Thereafter, the process terminates at block 78. Referring 

following sequential instructions (S2~S5) are also fetched. again to FIG. 36, the steps depicted at blocks 68-76 are 

Returning to FIG. 2, if the branch is predicted as not taken, illustrated within cydes 4-^. It is important to note in 

the process proceeds from block 60 through block 62 to comparison with FIG. 46, that instructions SI and S2 are 

block 64, which depicts processor 10 executing sequential executed during cycle 4 and that two instructions within the 

instructions following the branch, unless the branch Instruc- sequential path are executed during each subsequent cyde. 

tion later resolves as incorrect Thereafter, the process Thus, in the example depicted in FIG. 36, the jn-esent 

temiinates at block 78. invention eliminates the brandi misprediction penalty. 

However, if the branch is predicted as taken, as illustrated As has been described, the present invention provides and 

in FIGS. 3^7-36, the process proceeds from block 60 through ^ improved method and system for executing instructions such 

block 62 to block 66, which depicts fetching the target that the branch misprediction penalty incurred when a 

instructions indicated by the branch and speculatively dis- branch is mispredicted as taken is minimized. According to 

patching sequential instructions previously fetched if execu- the present invention, sequential instructions following a 

tion units 22, 28, and 30 would otherwise be idle. Typically, branch instruction that is predicted as taken are speculatively 

high-performance processors, such as processor 10 of FIG. 65 dispatched to execution units that would otherwise be idle, 

1. achieve 80 percent to 90 percent accuracy in branch thus minimizing the number of cycles required to recover 

predictions. Because of the great likelihood ^at the pre- the sequential execution path if the branch instruction is 
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resolved incorrectly. Because the present invention enhances 
the utilization of already available resources, the present 
invention enhances processor performance with little addi- 
tional hardware cost or processing overhead. 

While the invention has been particularly shown and 5 
described with reference to a preferred embodiment, it will 
be understood by those skilled in the art that various changes 
in form and detail may be made therein without departing 
from the spirit and scope of the invention. 

What is claimed is: lO 

1. A method for executing instructions within a processor, 
said processor having a memory which stores a plurality of 
instructions arranged in a sequential order, wherein said 
processor has a predetermined maximum number of instruc- 
tions that can be dispatched for execution during each 
processor cycle, said method con:^)rising: 

fetching a subset of said plurality of sequential instruc- 
tions from said memory for execution by said proces- 
sor; 

determining whetho' said subset of said plurality of 20 
sequential instructions includes an luu^solved branch 
instruction; 

in response to a determination that said subset of said 
plurality of sequential instructions includes an unre- 
solved branch instruction, predicting if a branch indi- 25 
cated by said branch instruction will be taken; 

in response to said prediction, fetching at least one 
nonsequential target instruction indicated by said 
branch instruction from said memory; 

30 

determining whether or not said maxmium predetermined 
number of instructions can be dispatdied for execution 
during a processor cyde following said branch predic- 
tion from among sequential instructions preceding said 
branch instruction and said at least one target instruc- 
tion without dispatching any sequential instructions 
whidi follow said branch instruction in said sequential 
order; 

in response to a determination that said predetermined 
maximum number of instructions can be dispatched ^ 
during said processor cycle following said branch pre- 
diction from among sequential Instructions preceding 
said branch instruction and said at least one target 
instruction without dispatching any sequential instruc- 
tions which follow said branch instruction, dispatching 
said predetermined maximum number of instructions 
for execution; 

in response to a determination that said predetermined 
maximum nuxiiber of instructions cannot be dispatched 
during said processor cycle following said branch pre- 50 
diction without dispatching any sequential instructions 
which follow said branch instruction, speculatively 
dispatching a sequential instruction which follows said 
branch instruction for execution; 

in response to refutation of said branch prediction, 55 
cancelling said fetch of said target instruction; and 
executing said sequential instruction which follows 
said branch instruction, wherein a performance pen- 
alty for a mispredicted brandi is minimized. 

2. The method for executing instnictions within a proces- 60 
sor of claim 1, wherein each speculatively dispatched 
instruction is identified by a state of an associated specula- 
tive indicator, said method further comprising: 

in response to refutation of said branch prediction, reset- 
ting a speculative indicator associated with said 65 
sequential instruction which follows said branch 
instruction. 
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3. The method for executing instructions witiiin a proces- 
sor of claim 1, said method further comprising: 

in response to refutation of said branch, fetching addi- 
tional sequential instructions according to said sequen- 
tial order. 

4. The method for executing instructions within a proces- 
sor of claim 1, said method further comprising: 

in response to resolution of said branch as taken, cancel- 
ling said sequential instruction which follows said 
brandi instruction path; and 

halting dispatch of subsequent sequential instructions. 

5. The method for executing instructions within a proces- 
sor of claim 1, said method further comprising: 

in response to a failure to resolve said branch prediOion 
during a processor cycle in which said at least one 
target instruction is fetched, cancelling said sequential 
instruction which follows said branch instruction; and 

halting dispatch of subsequent sequential instructions. 

6. The method for executing instructions within a proces- 
sor of claim 1, wherein said step of predicting comprises 
dynamic branch prediction. 

7. A system for executing instructions within a processor, 
said processor having a memory which stores a plurality of 
instructions arranged in a sequential order, wherein said 
processor has a maximum number of instructions diat can be 
dispatched for execution during each processor cycle, said 
system comprising: 

means for fetching a subset of said plurality of sequential 
instructions from said memory for execution by said 
processor; 

means for determining whether said subset of said plu- 
rality of sequential instructions includes an unresolved 
branch instruction; 

means, responsive to a determination that said subset of 
said plurality of sequential instructions includes an 
unresolved branch instruction, for predicting if a 
branch indicated by said branch instruction will be 
taken; 

means for fetching at least one nonsequential target 
instruction indicated by said branch instruction from 
said memory in response to said prediction; 

means for determining whether or not said maximum 
predetermined number of instructions can be dis- 
patched for execution during a processor cycle follow- 
ing said branch prediction from among sequential 
instructions preceding said branch instruction and said 
at least one target instruction without dispatching any 
sequential instructions which follow said branch 
instruction in said sequential order; 

means, responsive to a determination that said predeter- 
mined maximum number of instructions can be dis- 
patched during said processor cycle following said 
branch prediction from among sequential instructions 
preceding said branch instruction and said at least one 
target instruction without dispatching any sequential 
instructions which follow said branch instruction, for 
dispatching said predetermined maximum number of 
instructions for execution; 

means, responsive to a determination tiiat said predeter- 
mined maximum number of instmctions cannot be 
dispatched during said processor cycle following said 
branch prediction without dispatching any sequential 
instructions which follow said branch instruction, for 
speculatively dispatching a sequential instruction 
which follows said branch instruction for execution; 

responsive to refutation of said branch prediction. 
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means for caaoeUing said fetdi of said target instruc- 
tion; and 

means for exeoiting said sequential instruction which 
follows said branch instruction, wherein a perfor- 
mance penalty for a mispredicted branch is mini- s 
mized. 

8. The system for executing instructions within a proces- 
sor of claim 7» wherein eadi speculatively dispatched 
instruction is identified by a state of an associated specula- 
tive indicator, said system further conqjrising: lo 

means for resetting a speculative indicator associated with 
said sequential instruction which follows said branch 
instruction in response to refutation of said branch 
prediction. 

9. The system for e:cecuting instructions within a proces- 15 
sor of daim 7, said system further comprising means for 
fetching additional sequential instructions according to said 
sequential order. 

10. The system for executing instructions within a pro- 
cessor of claim 7. said system furdier conq)rising: 20 

means for cancelling said sequential instruction which 
follows said branch instruction in response to resolu- 
tion of said branch as taken; and 

means for halting dispatch of subsequent sequential 
instructions. 25 

11. TTie system for exeoiting instructions within a pro- 
cessor of claim 7, said system further comprising: 

means for cancelling said sequential instruction which 
follows said brandi instruction in response to a failure 
to resolve said branch prediction during a processor ^ 
cyde in which said target instruction is fetched; and 

means for halting dispatch of subsequent sequential 
instructions. 

12. The system for executing instructions within a pro- 
cessor of claim 7, wherein said means for predicting com- 
prises a dynamic branch prediction mechanism. 

13. A data processing system, comprising: 

a memory which stores a plurality of instructions arranged 

in a sequential order; 
a processor having a predetermined maximum number of 
instructions that can be dispatched for execution during 
each processor cydc» said processor induding: 
means for fetching a subset of said plurality of sequen- 
tial instructions from said memory for execution by 
said processor; 
means for determining whether said subset of said 
plurality of sequential instructions includes an unre- 
solved branch instruction; 
means, responsive to a detennination that said subset of 
said plurality of sequential instructions includes an 
unresolved branch instruction, for predicting if a 
branch indicated by said branch instruction will be 
taken; 

means for fetching at least one nonsequential target 
instruction indicated by said branch instruction from 
said memory in response to said prediction; 

means for determining whether or not said maximum 
predetermined number of instructions can be dis- 
patched for execution during a processor cycle fol- 
lowing said branch prediction from among sequen- 
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tial instructions preceding said branch instruction 
and said at least one target instruction without dis- 
patching any sequential instructions whidi follow 
said branch instruction in said sequential order; 
means, responsive to a detennination that said prede- 
termined maximum number of instructions can be 
dispatched during said processor cycle following 
said branch prediction from among sequential 
instructions preceding said branch instruction and 
said at least one target instruction without dispatdi- 
ing any sequential instructions which follow said 
branch instruction, for dispatching said predeter- 
mined maximum number of instructions for execu- 
tion; 

means, responsive to a determination that said prede- 
termined maximum number of instructions cannot be 
dispatched during said processor cycle following 
said branch prediction without dispatching any 
sequential instructions which follow said branch 
instruction, f(s speculatively dispatdiing a sequen- 
tial instruction which follows said branch instruction 
for execution; 
responsive to rotation of said branch prediction, 
means for cancelling said fetch of said target instruc- 
tion; and 

means for executing said sequential instruction which 
foUows said branch instruction, wherein a perfor- 
mance penalty for a mispredicted branch is mini- 
mized. 

14. The data processing system of claim 13, wherein each 
speculatively dispatched instruction is identified by a state of 
an associated speculative indicator, said data processing 
system further comprising: 

means for resetting a speculative indicator associated with 
said sequential instruction which follows said branch 
instruction in response to refutation of said branch 
prediction. 

15. The data processing system of daim 13; said data 
processing system further conqnising means for fetching 
additional instructions according to said sequential order. 

16. The data processing system of claim 13, said data 
processing system further comprising: 

means for cancelling said sequential instruction which 
follows said brandi instruction in response to resolu- 
tion of said branch as taken; and 

means for halting dispatch of subsequent sequential 
instructions. 

17. The data processing system of claim 13, said data 
processing system further comprising: 

means for cancelling said sequential instruction which 
follows said branch instruction in response to a failure 
to resolve said branch prediction during a processor 
cyde in which said at least one target instmction is 
fetched; and 

means for halting dispatch of subsequent sequential 
instructions. 

18. The data processing system of claim 13, wherein said 
means for predicting coiiq)rises a dynamic brandi prediction 
mechanism- 



03/10/2004, EAST version: 1.4.1 



